3  Sites

Published

April 2, 2025

Overview

The sites dataset is a core component of the Pristine Seas BigQuery database. It contains a collection of method-specific site tables that document the what, where, when, and who of fieldwork conducted during each scientific expedition. These tables provide a high-level summary of survey locations and sampling activity, capturing essential spatial, temporal, and logistical metadata.

Site tables are not designed to store method-specific sampling events such as transects, deployments, or replicates — those are handled separately in the corresponding method.stations tables within each method’s dataset.

A site represents a unique point in space and time where one or more scientific survey methods were conducted. Each site is uniquely identified by a standardized ps_site_id and serves as the fundamental spatial-temporal unit across the Pristine Seas database.

A site may contain one or more stations, each representing a specific sampling event. Stations may differ by: - Method (e.g., fish BLT vs. benthic LPI conducted at the same UVS site) - Depth stratum (e.g., submersible transects at different depths) - Replicate (e.g., multiple pelagic BRUVS rigs deployed at a single site)

This hierarchical structure allows for rich, scalable representation of spatially and methodologically diverse sampling events.

Core Site Schema

All site tables in the database share a core schema that defines the essential spatial and temporal metadata for each sampling location (Table 3.1). These fields represent the what, where, when, and who of data collection and are required across all site tables, regardless of method.

This standardized structure enables consistent quality control, supports spatial and temporal analysis, and facilitates integration of data across methods, expeditions, and years.

Table 3.1: Core Site Schema
Table Field Type Required Description
Core exp_id STRING TRUE Unique expedition identifier in the format `ISO3_YYYY` (e.g., `PNG_2024`).
leg STRING TRUE Cruise leg or operational phase (e.g., Leg 1, Caribbean vs. Pacific)
survey_type STRING TRUE Type of survey conducted. Allowed values: `uvs`, `sbruvs`, `pbruvs`, `sub`, `rov`, `dscm`, `bird`, `ysi`, 'edna'
ps_site_id STRING TRUE Unique Pristine Seas site ID in the format `ISO3_YYYY_survey_###` (e.g., `PNG_2024_uvs_001`).
location STRING TRUE General area of the site (e.g., Gulf of Tribugá, Three Sister, Duff Islands)
sublocation STRING TRUE Finer-scale geographic area within the location, such as an island, atoll, bay (e.g., Ensendada de Utría, Bajo Nuevo)
date DATE TRUE Sampling date in `YYYY-MM-DD` format.
time TIME TRUE Local time of sampling (e.g., `14:30`). Format: 24-hour `HH:MM`
lat FLOAT TRUE Latitude in decimal degrees (e.g., `-0.7512`). Negative = south (WGS84)
lon FLOAT TRUE Longitude in decimal degrees (e.g., `-91.0812`). Negative = west (WGS84)
team_lead STRING TRUE Name of team lead or responsible field scientist
notes STRING FALSE Free-text notes describing the site

Method-Specific Tables

While all site tables share a standardized core schema, each sampling method introduces additional fields that capture metadata unique to that method. These method-specific fields provide essential contextual detail such as depth, platform type, habitat classification, or deployment parameters.

The following method-specific site tables are currently included in the sites dataset.

Underwater Visual Surveys

Underwater Visual Survey (UVS) sites represent the core spatial unit for SCUBA-based survey methods conducted during Pristine Seas expeditions. These methods include fish belt transects (BLT), benthic line point intercept (LPI), invertebrate counts, coral recruit surveys, and others.

In addition to the core site fields, the uvs_sites table includes two key controlled fields used to provide ecological and environmental interpretation of each site. These are:

  • habitat:
    • fore reef: Outer slope of a reef, typically high-energy and wave-exposed.
    • back reef: Protected area behind the reef crest, often calmer and more sheltered.
    • fringing reef: Reef structure that grows directly from the shoreline.
    • patch reef: Isolated, often small reef outcrops within a lagoon or sandy area.
    • reef flat: Shallow, flat section of a reef, often exposed at low tide.
    • channel: Natural passage between reef structures or through atolls.
    • seagrass: Shallow marine habitat dominated by seagrass beds.
    • rocky reef: Hard-bottom habitat composed primarily of rock.
    • other: Habitat that does not fit predefined categories.
  • exposure:
    • windward: Side of the island or reef facing prevailing winds and wave energy. Typically higher energy environments with more exposure to ocean swell.
    • leeward: Sheltered side, facing away from prevailing winds. Typically calmer, with reduced wave action.
    • lagoon: Located within a lagoon system, protected from direct oceanic exposure. Often shallow and calm, with restricted circulation.
    • other: Exposure type does not fit standard categories (e.g., enclosed bays).

Additional fields include a site_name (often used for repeat surveys), the name of the local community (where relevant), protection status, and flags indicating which UVS sub-methods were conducted at each site (Table 3.2).

Table 3.2: Additional fields to the core site fields in the uvs_sites table
Table Field Type Required Description
uvs site_name STRING FALSE Site name used in prior surveys or local knowledge (e.g., TNC_2000_001, Punta Esperanza)
habitat STRING TRUE Dominant habitat type. Allowed: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *lagoon patch reef*, *channel*, *seagrass*, *rocky reef*, *other*
exposure STRING TRUE Wind and wave exposure at the site. Allowed: *windward*, *leeward*, *lagoon*, *other*
community STRING FALSE Nearest local community or population center to the site
protected BOOLEAN FALSE Whether the site is within a marine protected area (MPA) or Tambu
blt BOOLEAN FALSE Whether fish belt transects were done at this site
lpi BOOLEAN FALSE Whether benthic point intercept transects were done at this site
ysi BOOLEAN FALSE Whether YSI environmental profile was done at this site
inverts BOOLEAN FALSE Whether invertebrate surveys were done at this site
recruits BOOLEAN FALSE Whether coral recruit surveys were done at this site
e_dna BOOLEAN FALSE Whether eDNA samples were collected at this site
photomosaic BOOLEAN FALSE Whether Photomosaic imagery was collected at this site

eDNA

The edna_sites table contains one row per environmental DNA (eDNA) sampling site. Each site represents a distinct point in space and time and serves as the primary spatial unit for eDNA fieldwork. Within a site, multiple water samples (replicates) may be collected across different depth strata, recorded in the corresponding edna.stations table.

In addition to the core site fields, the edna_sites table includes method-specific metadata (Table 3.3), such as:

  • exposure – Same controlled vocabulary as in uvs_sites
  • habitat – Same as uvs_sites, with the following additional categories:
    • open water – Offshore or pelagic environments
    • bay – Semi-enclosed coastal embayments
    • estuary – Transitional area between river and marine systems
    • mangrove – Shallow, intertidal forested coastal habitat
Table 3.3: Additional fields to the core site fields in the edna_sites table
Table Field Type Required Description
edna habitat STRING TRUE Dominant habitat type. Allowed values: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *channel*, *seagrass*, *rocky reef*, *open water*, *bay*, *estuary*, *mangrove*, *other*.
exposure STRING TRUE Wind and wave exposure at the site. Allowed values: *windward*, *leeward*, *lagoon*, *other*.
paired_ps_site_id STRING FALSE `ps_site_id` of a paired site (e.g., a `uvs` or `pbruvs` site), if applicable
n_stations INTEGER TRUE Number of unique stations (i.e., depth strata) sampled at the site
n_samples INTEGER TRUE Total number of water samples (replicates) collected at the site
site_photos STRING FALSE path to associated site photos, if available (e.g., eDNA/site_photos/COL-2022-edna-001)

Seabed BRUVS

The sbruvs_sites table contains one row per seabed Baited Remote Underwater Video (sBRUV) deployment site. These sites represent individual stationary stereo-video deployments, typically conducted at depths from 10 to 70 meters.

Each site corresponds to a single BRUV deployment, meaning that site and station are effectively one-to-one for this method.

In addition to the core site schema, the sbruvs_sites table includes method-specific descriptors (Table 3.4):

  • habitat – Same controlled vocabulary as uvs_sites, with the following additional values: bay, estuary, mangrove, sand flat
  • exposure – Same vocabulary as uvs_sites.

Deployment-specific details such as depth, rig ID, and camera identifiers are stored in the associated sbruvs.stations table.

Table 3.4: Additional fields to the core site fields in the sbruvs_sites table
Table Field Type Required Description
sbruvs habitat STRING TRUE Simplified habitat classification at the site. Allowed values: *fore reef*, *back reef*, *fringing reef*, *patch reef*, *reef flat*, *channel*, *seagrass*, *rocky reef*, *bay*, *estuary*, *mangrove*, *sand flat*, *other*.
exposure STRING TRUE Wind and wave exposure at the site. Allowed values: *windward*, *leeward*, *lagoon*, *other*.

Pelagic BRUVS

Pelagic Baited Remote Underwater Video (pBRUV) sites represent open-water deployments of stereo-video systems used to survey pelagic fish communities. Each site corresponds to a single 5-rig deployment set, with each rig treated as a separate station. As such, the pbruvs_sites table contains one row per deployment set, while rig-specific data are recorded in the corresponding pbruvs.stations table.

In addition to the core site schema, the pbruvs_sites table summarizes deployment metadata across all five rigs in a standardized way (Table 3.5):

  • n_rigs – Number of rigs deployed (typically 5)
  • drift_m – Mean drift distance (meters) across rigs
  • drift_hrs – Mean soak time (hours)
  • uwa_string_id – String (site) identifier used by the University of Western Australia

Latitude and longitude represent the mean start position across all rigs, and time fields reflect the start time of the first rig. These values provide a spatial-temporal summary of the full deployment set.

Table 3.5: Additional fields to the core site fields in the pbruvs_sites table
Table Field Type Required Description
pbruvs n_rigs INTEGER TRUE Number of rigs deployed at the site (typically 5)
drift_m FLOAT TRUE Mean drift distance across all rigs, in meters (m).
drift_hrs FLOAT TRUE Mean deployment duration across all rigs, in hours (h).
uwa_string_id STRING TRUE String (site) identifier used by the University of Western Australia

Birds

The bird_sites table contains one row per seabird survey transect. Each site represents the starting location and time of a vessel- or land-based transect during which seabird observations were recorded. Each site corresponds to a single station, representing the full transect.

Although transects are mobile, the ps_site_id is anchored to the start point of the transect to provide consistent spatial referencing across the dataset.

In addition to the core site schema, the bird_sites table includes a site-level descriptor for habitat, using a custom controlled vocabulary tailored to these surveys:

  • open ocean – Offshore transects over deep water, far from land or coastal influence
  • coastal – Nearshore waters along mainland or island coastlines
  • inshore – Sheltered bays, estuaries, or nearshore zones with limited wave exposure
  • island – Terrestrial habitats on offshore islands, often with seabird nesting colonies
  • inland – Land-based habitats far from marine influence (e.g., wetlands, forest, grassland)
  • other – Rare or unique environments not captured by the categories above

Transect-specific metadata — including platform type, duration, distance traveled, and species observations — are stored in the corresponding birds.stations and birds.observations tables.

Table 3.6: Additional fields to the core site fields in the birds_sites table
Table Field Type Required Description
birds habitat STRING TRUE Broad classification of the survey environment. Allowed values: *open ocean*, *coastal*, *inshore*, *island*, *inland*, *other*.

ROV

Each ROV (Remotely Operated Vehicle) deployment is represented by a single site with one or more associated stations. The site corresponds to the full ROV dive (deployment), while each station represents a horizontal transect or observational segment within the dive. This structure follows the standard Pristine Seas convention: sites capture high-level spatial and temporal metadata, while stations contain transect-specific sampling and observation data.

The rov_sites table records the core spatial and temporal metadata for each ROV deployment. Deployment start time (time_deploy) and coordinates (lat_deploy, lon_deploy) are used to populate the standardized core fields time, lat, and lon, ensuring consistency across methods.

Method-specific metadata—such as recovery time and coordinates, dive_type, max_depth_m, duration, and highlights—are retained within the rov_sites table (Table 3.7).

Transect-specific information, including start/end depth, time, coordinates, and observation notes, is stored in the corresponding rov.stations table.

Table 3.7: Additional fields to the core site fields in the rov_sites table
Table Field Type Required Description
rov dive_type STRING FALSE Purpose of the dive (e.g., transect, exploration, sample collection)
time_deploy TIME TRUE Time ROV left the surface
lat_deploy FLOAT TRUE Latitude at ROV deployment
lon_deploy FLOAT TRUE Longitude at ROV deployment
time_recovery TIME FALSE Time ROV returned to the surface
lat_recovery FLOAT FALSE Latitude at ROV recovery
lon_recovery FLOAT FALSE Longitude at ROV recovery
max_depth_m FLOAT TRUE Maximum depth reached during the dive
duration TIME FALSE Total duration of the dive (hh:mm:ss)
highlights STRING FALSE Narrative summary or scientific highlights of the dive

Submersible

Each submersible dive is represented by a single site with one or more associated stations. The site corresponds to the entire submersible deployment (dive), while each station represents a horizontal transect or visual survey segment conducted during that dive.

The sub_sites table captures the spatial, temporal, and operational context of each dive. In addition to the standardized core fields shared across all site tables, it includes method-specific metadata relevant to submersible operations—such as the submersible name, dive_type (e.g., science, media, policy), max_depth_m, observers, pilot, and precise timestamps for key waypoints (e.g., time on bottom, surface recovery).

To maintain alignment with the shared site schema:

  • The start of descent provides the time, lat, and lon used in the core fields.
  • The primary scientific observer (observer_1) is mapped to team_lead.

Transect-specific information, such as start/end depth, time, habitat descriptions, and notes, is stored in the corresponding sub.stations table.

Table 3.8: Additional fields to the core site fields in the sub_sites table
Table Field Type Required Description
Submersible sub_name STRING TRUE Name of submersible used (e.g., Argonauta or DeepSee)
dive_number STRING FALSE Running sub dive number
depth_max_m FLOAT TRUE Maximum depth reached (m)
duration TIME TRUE Total dive duration (hh:mm:ss)
temp_max_depth_c FLOAT FALSE Temperature at maximum depth (°C)
observer_1 STRING FALSE Primary scientific observer
observer2 STRING FALSE Secondary observer (if any)
pilot STRING FALSE Submersible pilot
dive_type STRING FALSE Type of dive. Allowed values: science, media, policy, training
collection BOOLEAN FALSE Whether any biological collection occurred
transect BOOLEAN FALSE Whether transects were conducted
edna BOOLEAN FALSE Whether eDNA samples were collected
time_descent TIME TRUE Time when sub began descent
lat_descent FLOAT TRUE Latitude at start of descent
lon_descent FLOAT TRUE Longitude at start of descent
time_on_bottom TIME FALSE Time of first bottom contact
lat_on_bottom FLOAT FALSE Latitude at bottom contact
lon_on_bottom FLOAT FALSE Longitude at bottom contact
time_off_bottom TIME FALSE Time when sub left the bottom
lat_off_bottom FLOAT FALSE Latitude at lift-off
lon_off_bottom FLOAT FALSE Longitude at lift-off
time_surface TIME FALSE Time when sub surfaced
lat_surface FLOAT FALSE Latitude at surface recovery
lon_surface FLOAT FALSE Longitude at surface recovery

Deep-Sea Cameras

Each deep-sea camera deployment is represented by a single site–station pair. In line with the Pristine Seas schema, the site captures the spatial and contextual metadata of the deployment, while the station represents the full observational unit — including technical specifications, environmental conditions, and recording parameters.

The dscm_sites table records the core spatial and temporal metadata for each deployment. Deployment time (time_deploy) and coordinates (lat_deploy, lon_deploy) populate the standard core fields time, lat, and lon, following conventions used across all methods.

Deployment-specific details — such as max_depth, bottom temperature, ambient water temperature, recovery time and position, and recording duration — are stored in the corresponding dscm.stations table.